A hidden Markov random field model for genome-wide association studies.
نویسندگان
چکیده
Genome-wide association studies (GWAS) are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single single nucleotide polymorphism (SNP) analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferroni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. In this paper, we propose a hidden Markov random field model (HMRF) for GWAS analysis based on a weighted LD graph built from the prior LD information among the SNPs and an efficient iterative conditional mode algorithm for estimating the model parameters. This model effectively utilizes the LD information in calculating the posterior probability that an SNP is associated with the disease. These posterior probabilities can then be used to define a false discovery controlling procedure in order to select the disease-associated SNPs. Simulation studies demonstrated the potential gain in power over single SNP analysis. The proposed method is especially effective in identifying SNPs with borderline significance at the single-marker level that nonetheless are in high LD with significant SNPs. In addition, by simultaneously considering the SNPs in LD, the proposed method can also help to reduce the number of false identifications of disease-associated SNPs. We demonstrate the application of the proposed HMRF model using data from a case-control GWAS of neuroblastoma and identify 1 new SNP that is potentially associated with neuroblastoma.
منابع مشابه
graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture
Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic varia...
متن کاملIncreasing the Efficiency of Genome-wide Association Mapping via Hidden Markov Models
With the rapid production of high dimensional genetic data, one major challenge in genome-wide association studies is to develop effective and efficient statistical tools to resolve the low power problem of detecting causal SNPs with low to moderate susceptibility, whose effects are often obscured by substantial background noises. Here we present a novel method that serves as an optimal techniq...
متن کاملPredicting CpG Islands and Their Relationship with Genomic Feature in Cattle by Hidden Markov Model Algorithm
Cattle supply an important source of nutrition for humans in the world. CpG islands (CGIs) are very important and useful, as they carry functionally relevant epigenetic loci for whole genome studies. As a matter of fact, there have been no formal analyses of CGIs at the DNA sequence level in cattle genomes and therefore this study was carried out to fill the gap. We used hidden markov model alg...
متن کاملGraphical-model Based Multiple Testing under Dependence, with Applications to Genome-wide Association Studies
Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. T...
متن کاملIncorporating Biological Pathways via a Markov Random Field Model in Genome-Wide Association Studies
Genome-wide association studies (GWAS) examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Biostatistics
دوره 11 1 شماره
صفحات -
تاریخ انتشار 2010